Top-Down Cohesion Segmentation in Summarization
نویسندگان
چکیده
The paper proposes a new method of linear text segmentation based on lexical cohesion of a text. Namely, first a single chain of disambiguated words in a text is established, then the rips of this single chain are considered as boundaries for the segments of the cohesion text structure (Cohesion TextTiling or CTT). The summaries of arbitrarily length are obtained by extraction using three different methods applied to the obtained segments. The informativeness of the obtained summaries is compared with the informativeness of the pair summaries of the same length obtained using an earlier method of logical segmentation by text entailment (Logical TextTiling or LTT). Some experiments about CTT and LTT methods are carried out for four “classical" texts in summarization literature showing that the quality of the summarization using cohesion segmentation (CTT) is better than the quality using logical segmentation (LTT).
منابع مشابه
Disunity in Cohesion: How Purpose Affects Methods and Results When AnalyzingLexical Cohesion
Lexical Cohesion is a commonly studied linguistic feature as it is easily identified from the surface of a text. However, the purposes for studying lexical cohesion are varied, and each purpose requires different methods. This study analyzes two short movie review texts for four different research purposes using lexical cohesion: text evaluation, text segmentation, text summarization, and text ...
متن کاملImproving Text Segmentation with Non-systematic Semantic Relation
Text segmentation is a fundamental problem in natural language processing, which has application in information retrieval, question answering, and text summarization. Almost previous works on unsupervised text segmentation are based on the assumption of lexical cohesion, which is indicated by relations between words in the two units of text. However, they only take into account the reiteration,...
متن کاملImplementation of an Automated Text Segmentation System Using Hearst’s Texttiling Algorithm
This paper describes the implementation of a text segmentation system based on Hearst’s TextTiling algorithm. Hearst is a pioneer in the field of text segmentation, and her algorithm has already been shown to provide good results. The algorithm uses lexical frequency and distribution information to recognize the level of cohesion between blocks of text, and then uses these cohesion estimates to...
متن کاملGenerating Reference Texts for Short Answer Scoring Using Graph-based Summarization
Automated scoring of short answers often involves matching a students response against one or more sample reference texts. Each reference text provided contains very specific instances of correct responses and may not cover the variety of possibly correct responses. Finding or hand-creating additional references can be very time consuming and expensive. In order to overcome this problem we prop...
متن کاملLexical cohesion, discourse segmentation and document summarization
Summaries automatically derived by sentence extraction are known to exhibit some coherence degradation, readability deterioration, and topical under-representation. We propose a strategy for improving upon these problems, aiming to generate more cohesive summaries by analyzing the lexical cohesion factors in the source document texts. As an initial experiment, we have looked at one particular f...
متن کامل